International Journal of Electronics, 
Communication & Instrumentation Engineering 
Research and Development (IJECIERD) 
ISSN(P): 2249-684X; ISSN(E): 2249-7951 
Vol. 4, Issue 1, Feb 2014, 37- 46 
© TJPRC Pvt. Ltd. 




DESIGN AND IMPLEMENTATION OF HIGH SPEED SIGNED MULTIP 
LIER USING 3_2 COMPRESSOR 

D. SRINU 1 , S. RAMBABU 2 & G. LEENENDRA CHOWDARY 3 

'Research Scholar, Department of ECE, SITE, Tadepaliigudem, Andhra Pradesh, India 
2 3 Assistant Professor, Department of ECE, SITE, Tadepaliigudem, Andhra Pradesh, India 



Multipliers play an important role in today's digital signal processing and various other applications. With advances 
in technology, many researchers have tried and are trying to design multipliers which offer either of the following design 
targets - high speed, low power consumption, regularity of layout and hence less area for compact VLSI implementation. 
Multiplier is based on the ancient algorithms (sutras) for multiplication [1]. This work is based on one of the sutras called 
Urdhava Tiryakbhyam. These sutras are meant for faster mental calculation. Though faster when implemented in hardware, it 
consumes less area. This paper presents a technique to modify the architecture of the Urdhava Tiryakbhyam by using 
compressor in order to reduce area and delay to improve overall performance. The coding is done for 16 bit(Q15), 32 bit(Q31) 
and 64 bit(Q63) fractional fixed point multiplications using Verilog HDL and Synthesized using Xilinx ISE version 9.2L The 
performance is compared in terms of area, delay with earlier existing architecture of Vedic multiplier. The proposed design 
shows very good improvements in terms of area and time delay. 
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INTRODUCTION 

Vedic Mathematics is the ancient system of mathematics which was rediscovered early last century by Sri Bharati 
Krisna Tirthaji [1]. The Sanskrit word "Veda" means "knowledge". He organized and classified the whole of Vedic 
Mathematics into 16 formulae or also called as sutras. These formulae form the backbone of Vedic mathematics. 
Great amount of research has been done all these years to implement algorithms of Vedic mathematics on digital processors. 

Currently implemented in many Digital Signal Processing (DSP) applications such as convolution, Fast Fourier 
Transform (FFT), filtering and in microprocessors in its arithmetic and logic unit [3]. For multiplication algorithms performed 
in DSP applications latency and throughput are the two major concerns from delay perspective. 

The algorithm to architecture mapping using floating point number representation consumes more hardware which 
tends to be expensive to overcome 

This drawback we chose for fixed point number representation is a good option to implement at silicon level [2]. 
Hence our focus in this work is to develop optimized hardware modules for multiplication operation considering fixed point 
representation, 16 bit Q15 format, 32 bit Q31 format and 64 bit Q63 format provide required precision for most of the digital 
signal processing applications and it is best suited for implementation on processors. In this paper we propose the 
implementation of fixed point Q-format [6] high speed multiplier using compressed Urdhava Tiryakbhyam method of Vedic 
mathematics, results clearly shows that compressed Urdhava Tiryakbhyam method is best suited for implementing multipliers. 
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FIXED POINT ARITHMETIC 

An N-bit fixed-point number can be interpreted as either: an integer (i.e., 20645), a fractional number (i.e., 0.75) 
N-bit fixed point, 2's complement integer representation 

x=-fi n -L2"- L +fj n _ : 2 f1 - : + +b 2° 

Integer fixed point is difficult to use in processors due to possible overflow In a 16-bit processor dynamic range in 
between -32,768 to 32,767. 

Example 

200 x 350 = 70000, this is an overflow! To overcome these draw back Fractional Fixed-Point Representation will be 
used which is suitable for DSP algorithms. Fractional number range is between 1 and -1 Multiplying a fraction by a fraction 
always results in a fraction and will not produce an overflow (e.g., 0.99 x 0.9999 less than 1) Successive additions may cause 
overflow Represent numbers between -1.0 and 1-2 - " -1 when N is number of bits 

Q- Format Representation 

In general any Q-format representation is denoted by Qm.n notation m bits for integer portion, n bits for fractional 
portion Total number of bits N = m + n + 1 , for signed numbers 

Example: 16-bit number (N=16) and Q2.13 format 2 bits for integer portion, 13 bits for fractional portion, 1 signed 
bit (MSB) 

Special cases: 16-bit integer number (N=16) => Q15.0 format 16-bit fractional number (N = 16) => Q0.15 format; 
also known as Q.15 or Q15 

Q-Format Multiplication 

When two Q15 numbers are multiplied their product is 32 bits long as illustrated in Figure 1. The product has a 
redundant or extended sign bit. Since the product stored in memory should also be a Q15 number we left shift the product by 
one bit and the most significant 16 bits (including sign bit) is stored in the memory 
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Figure 1: Multiplications of Two Q15 Format Numbers 
Yielding the Product in Q15 Formats itself 

Product of two Q15 numbers is Q30.So we must remember that the 32-bit product has two bits in front of the binary 
point. Since NxN multiplication yields 2N-1 result Addition MSB sign extension bit typically only the most significant 15 bits 
(plus the sign bit) are stored back into memory, so the write operation requires a left shift by one. 
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URDHAVA TIRYAKBHYAM METHOD 

Urdhava Tiryakbhyam [2] is a Sanskrit word which means vertically and crosswire in English. The method is a 
general multiplication formula applicable to all cases of multiplication. It is based on a novel concept through which all partial 
products are generated concurrently. 

Figure 2 demonstrates a 4 x 4 binary multiplication using this method. The method can be generalized for any N x N 
bit multiplication. This type of multiplier is independent of the clock frequency of the processor because the partial products 
and their sums are calculated in parallel. The net advantage is that it reduces the need of microprocessors to operate at 
increasingly higher clock frequencies. As the operating frequency of a processor increases the number of switching instances 
also increases. This results in more power consumption and also dissipation in the form of heat which results in higher device 
operating temperatures. Another advantage of Urdhva Tiryakbhyam multiplier is its scalability. The processing power can 
easily be increased by increasing the input and output data bus widths since it has a regular structure. Due to its regular 
structure, it can be easily layout in a silicon chip and also consumes optimum area. As the number of input bits increase, gate 
delay and area increase very slowly as compared to other multipliers. Therefore Urdhava Tiryakbhyam multiplier is time, 
space and power efficient. 

Figure 2, the least significant bit (LSB) of the multiplier is multiplied with least significant bit of the multiplicand 
(vertical multiplication). This result forms the LSB of the product. In step 2 next higher bit of the multiplier is multiplied with 
the LSB of the multiplicand and the LSB of the multiplier is multiplied with the next higher bit of the multiplicand 
(crosswire multiplication). These two partial products are added and the LSB of the sum is the next higher bit of the final 
product and the remaining bits are carried to the next step the partial products and their sums for every step can be calculated 
in parallel. 
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Figure 2: Multiplications of Two 4 Bit Numbers Using Urdhava Tiryakbhyam Method. [7] 

Thus every step in Figure 2 has a corresponding expression as 

Follows: 

rO=aObO. (1) 

clrl=albO+aObl. (2) 

c2r2=cl+a2b0+albl + a0b2. (3) 

C3r3=c2+a3b0+a2bl + alb2 + a0b3. (4) 

C4r4=c3+a3bl+a2b2 + alb3. (5) 
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C5r5=c4+a3b2+a2b3. (6) 
c6r6=c5+a3b3 (7) 

With c6r6r5r4r3r2rlr0 being the final product [5]. 
ARCHITECTURE 

Our design of Q-format signed multiplier includes Urdhava Tiryakbhyam integer multiplier [4] with certain 
modifications as follows 

A. 3_2 Compressor 

High speed multipliers use 3-2, 4-2 and 5-2 compressors to lower the latency of partial product reduction part [8]. 
These compressors are used to minimize delay and area which leads to increase the performance of the overall system. 
Compressors are generally designed by XOR-XNOR gates and multiplexers. A compressor is a device which is used to reduce 
the operands while adding terms of partial products in multipliers. An X-Y compressor takes X equally weighted input bits 
and produces Y-bit binary number. The most widely and the simplest used compressor is the 3-2 compressor which is also 
known as a full adder. A 3-2 compressor has three inputs XI, X2, X3 and generates two 

Outputs they are sum and the carry bits. The block diagram of 3-2 compressor is shown in figure. 
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Figure 3: (a) A 3_2 Compressor 
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Figure 3: (b) 3_2 Compressor Truth Table 
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Figure 3: (c) Conventional 3_2 Compressor 

The conventional architectures of 3-2 compressor shown in figure 3(c), it has two XOR gates in the critical path. 
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The sum output is generated by the second XOR and carry output is generated by the multiplexer (MUX). 
The Equations governing the conventional 3-2 compressor outputs are shown below: 



X 1 +X2+X3 =sum+2 • carry- 
Sum=Xl 0X20X3 



Carry= (XI ©X2) «X3+ (XI ©X2) «X1- 



(1) 

(2) 

(3 ) 

This compressor based multiplier is faster since all the partial products are computed concurrently. Considering a 16 
bit Q15 multiplier, the product is also a Q15 number which is 16 bits long. Firstly, if the MSB of input is 1 then it is a negative 
number. Therefore 2's complement of the number is taken before proceeding with multiplication. Since the MSB denotes sign 
it is excluded and a '0' is placed in this position while multiplying. A Q15 format multiplier consists of four 8x8 Urdhava 
multipliers and the resulting product is 32 bits 

Long as shown in Figure 4 But the product of a Q15 number is also a Q15 number which should be 16 bits long. 

Therefore the 32 bit product is left shifted by 1 bit to remove the redundant sign bit and only the most significant 16 
bits of this product are considered which constitute the final product. An xor operation is performed on the input sign bits to 
determine the sign of the result, if the output is '1 'it enables the conversion of the 16 bit final result to its 2's compliment 
format indicating a negative product. 
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Figure 4: Architecture of a Q15 Format Multiplier. Multiplication of Two Q15 Numbers 
X and Y Results in a Q15 Product Denoted by P in the Figure 
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Figure 5: Architecture of a Q31 Format Multiplier. Multiplication of Two Q31 Numbers 
X and Y Results in a Q31 Product Denoted by P in the Figure 
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In Figure 5 but the product of a Q31 number is also a Q31 number which should be 32 bits long. Therefore the 64 bit 
product is left shifted by 1 bit to remove the redundant sign bit and only the most significant 32 bits of this product are 
considered which constitute the final product. A xor operation is performed on the input sign bits to determine the sign of the 
result. If the output is ' 1 'it enables the conversion of the 32 bit final result to its 2's compliment format indicating a negative 
product. 

Similarly as shown in Figure 6 but the product of a Q63 number is also a Q63 number which should be 64 bits long. 
Therefore the 128 bit product is left shifted by 1 bit to remove the redundant sign bit and only the most significant 64 bits of 
this product are considered which constitute the final product. A xor operation is performed on the input sign bits to determine 
the sign of the result. If the output is ' 1 'it enables the conversion of the 64 bit final result to its 2's compliment format 
indicating a negative product. 
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Figure 6: Architecture of a Q63 Format Multiplier. Multiplication of Two Q63 Numbers 
X and Y Results in a Q63 Product Denoted by P in the Figure 

IMPLEMENTATION AND RESULTS 

The proposed compressed Urdhava tirykbhyam Q_format multiplier is designed using verilog HDL and structural 
form of coding. The basic block of both Q15 and Q31 multiplier is a 4 x 4 Urdhava Tiryakbhyam integer multiplier which in 
turn is made up of two 2x2 multiplier blocks.. The Code is completely synthesized using Xilinx XST and Implemented on 
device family Virtex-5, device XC5VL50, Package FF324 with speed grade -2. 

Simulation Results 

The design was simulated using Isim on Xilinx ISE 9.2i version. 
For Q15 format multiplication as shown in Figure 6, 
Inputl =-0.75 = 1010 0000 0000 0000 
Input2 =- 0.25 = 1 100 0000 0000 0000 
Output = 0.1875 = 0001 1000 0000 0000 
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Figure 7: Q15 Multiplication 

For Q31 format multiplication as shown in Figure 8. 
Inputl=0.333333 = 0010101010101010101001 1111011111 
Input2=-0.666666 = 10101010101010101011000001000010 
Output= -0.2222217777743935585021972655625= 1111 

1000 1 1 10 001 1 0001 101 1 1000 0101 But the actual value of the product is -0.222221777778. Therefore precision 
loss is involved in this Multiplication and is found to be 3.60644E-12 which is less Than the resolution of Q31 representation 
i.e. 2-31. Thus it Provides 32 bit accurate product which is acceptable for most Of the DSP applications. 
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Figure 8: Q31 Multiplication 

For Q63 format multiplication as shown in Figure 8. 

Inputl= 00001 1 1 1 1 1 1 100001 1 1 100001 1 1 100001 1 1 100001 1 1 100001 1 1 100 001 1 1 10000 
Input 2= 101010101010101010101010101010101010101010101010101010 1010101010 
Output= 

101000001010000010100000101000001010000010100000101000001001 1 1 1 10101 1 1 1 10101 1 1 1 10101 111101 
01111101011111010111110101111101100000 
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Figure 9: Q63 Multiplication 
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RTL SCHEMATICS 




Figure 10: RTL Schematic of 16 BIT 




Figure 11: RTL Schematic of 32 BIT 




Figure 12: RTL Schematic of 64 BIT 



Table 1: Comparison of Area Occupied and Speed of Various 
Multiplier Architectures for 16 Bit 



AJ sari til 
m used 


LC 
T* 
Use 
d 


Total 

LIT 5 
pre se 
lit 


"■-oof 
aiea 
occupie 
d 


Frequeuc 

Y 

(MHz) 


Time 

&») 


Urdhv.a- 
Tiryakbh 
yam 


425 


2SSD0 


1.47 


70.24 


14J23G 


Comprei 
sor 

based 
Urdhwa 
Tiryakbh 
yam 


421 


:ss:-: 


1.46 


77.SS 


12.S40 



Design and Implementation of High Speed Signed Multiplier Using 3_2 Compressor 



45 



Table 2: Comparison of Area Occupied and Speed of Various 
Multiplier Architectures for 32 Bit 
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Table 3: Comparison of Area Occupied and Speed of Various 
Multiplier Architectures for 64 Bi 
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CONCLUSIONS 

This paper proposed fast multiplier architecture for signed Q-format multiplications using compressor based 
Urdhava Tiryakbhyam method of Vedic mathematics. Since Q-format representation is widely used in Digital Signal 
Processors, the proposed compressed Urdhava Tiryakbhyam method can substantially speed up the multiplication operation 
which is the basic hardware block. They occupy less area and are faster than the Urdhava Tiryakbhyam method. Therefore the 
compressed Urdhava Tiryakbhyam Q-format multiplier is best suited for digital signal processing applications requiring faster 
multiplications. 
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